Managing Keyword Variation with Frequency Based Generation of Word Forms in IR
نویسنده
چکیده
This paper presents a new management method for morphological variation of keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. The proposed method has been evaluated so far with four languages, Finnish, Swedish, German and Russian, which show varying degrees of morphological complexity.
منابع مشابه
Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?
In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 – 88 per cent of the occurrences of inflecte...
متن کاملExperiments on Automatic Web Page Categorization for IR system
This paper describes keyword-based Web page categorization. Our goal is to embed our categorization technique into information retrieval (IR) systems to facilitate the end-users’ search task. In such systems, search results must be categorized faster, while keeping accuracy high. Our categorization system uses a knowledge base (KB) to assign categories to Web pages. The KB contains a set of cha...
متن کاملAutomatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval
This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation an...
متن کاملبهبود کارایی سیستم کاوشگر کلمات تلفنی با استفاده از نرمالیزاسیون امتیاز اطمینان مبتنی بر روش برنامهریزی خطی
Conventional word spotting systems determine hypothesized keywords and their confidence score using a speech recognizer. Acceptance or rejection of these keywords is intended based on comparison of their scores with a specific threshold. It has been proved that confidence score prepared by recognizer is highly dependent on sub-word structure of each keyword. So comparing assigned scores to keyw...
متن کاملBIC based on Modified Droop Control of Hybrid AC/DC Microgrid with PV/Wind/ESS under Variable Generation and Load Conditions
The idea of a microgrid is created by utilizing more diverse ac or dc distributed generation (DG) sources along with an energy storage system (ESS) and loads. The most efficient and reliable selection of ac and dc microgrids is a hybrid ac/dc microgrid. The hybrid microgrid largely overcomes the shortcomings of standalone ac or dc microgrids. A bidirectional interlinking converter (BIC) is util...
متن کامل